79 research outputs found

    Near-Optimal Algorithms for Differentially-Private Principal Components

    Full text link
    Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data in high dimension. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method.Comment: 37 pages, 8 figures; final version to appear in the Journal of Machine Learning Research, preliminary version was at NIPS 201

    Learning from Data with Heterogeneous Noise using SGD

    Full text link
    We consider learning from data of variable quality that may be obtained from different heterogeneous sources. Addressing learning from heterogeneous data in its full generality is a challenging problem. In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source. We study how to use stochastic gradient algorithms to learn in this model. Our study is motivated by two concrete examples where this problem arises naturally: learning with local differential privacy based on data from multiple sources with different privacy requirements, and learning from data with labels of variable quality. The main contribution of this paper is to identify how heterogeneous noise impacts performance. We show that given two datasets with heterogeneous noise, the order in which to use them in standard SGD depends on the learning rate. We propose a method for changing the learning rate as a function of the heterogeneity, and prove new regret bounds for our method in two cases of interest. Experiments on real data show that our method performs better than using a single learning rate and using only the less noisy of the two datasets when the noise level is low to moderate

    Auditing: Active Learning with Outcome-Dependent Query Costs

    Full text link
    We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error. We design auditing algorithms for simple hypothesis classes (thresholds and rectangles), and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. We also discuss a general competitive approach for auditing and possible modifications to the framework.Comment: Corrections in section

    Generalized Opinion Dynamics from Local Optimization Rules

    Full text link
    We study generalizations of the Hegselmann-Krause (HK) model for opinion dynamics, incorporating features and parameters that are natural components of observed social systems. The first generalization is one where the strength of influence depends on the distance of the agents' opinions. Under this setup, we identify conditions under which the opinions converge in finite time, and provide a qualitative characterization of the equilibrium. We interpret the HK model opinion update rule as a quadratic cost-minimization rule. This enables a second generalization: a family of update rules which possess different equilibrium properties. Subsequently, we investigate models in which a external force can behave strategically to modulate/influence user updates. We consider cases where this external force can introduce additional agents and cases where they can modify the cost structures for other agents. We describe and analyze some strategies through which such modulation may be possible in an order-optimal manner. Our simulations demonstrate that generalized dynamics differ qualitatively and quantitatively from traditional HK dynamics.Comment: 20 pages, under revie
    • …
    corecore